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Specifications 

1 . Title of the Invention: Processing System Using Multiple Line Cache DRAM 

2. Scope of the Patent's Claim 

(1) Memory device, operationally containing an array of memory cells which are 
individually arranged in rows and column, containing a buffer which receives or stores data 
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signal comprising said array of memory cells having individual memory cells; wherein this 
buffer is divided into more than one blocks. 

3. Detailed Explanation of the Invention 

f Sphere of Industrial Use") 

This invention relates to a semiconductor memory device, more specifically it relates to a 
dynamic random access array using the static column decode (SCD) design and to a system 
using such a device. 

Prior Art Technology and Problem Areas 

As various requirements have been made in recent years on the data processing systems, 
their functions and general tendencies have been improved. As processors, in particular 
microprocessors, have become more powerful and very fast, the systems are able to operate at 
very high speeds. On the other hand, while the memory has not become faster, the bit size has 
been increased many times and the cost per bit has been reduced. This is applicable in particular 
to dynamic random access memory (DRAM). Therefore, many methods have been proposed and 
developed to enable access to the high-density memory at a more compatible speed so that data 
could be fetched, used and returned by a microprocessor. According to one of these methods, a 
cache memory is used to store one portion of the data from the main memory device. This 
method can be successful provided that at least two conditions are met. One of these two 
conditions is that the memory access time used by the cache memory must be much faster than 
the access time of the main memory, and the other is that one portion of the data stored in the 
cache memory, called by the special term "hit" must have a very probability of being accessed in 
the microprocessor. 
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The implementation of these cache memory devices has been developed in the technical 

sphere. 

. Because static random access memory (SRAM) devices have fast access times when 
compared to DRAM memory, they have been used for cache memory. For example, although the 
general DRAM access time is 120 nanoseconds, the SRAM memory access is generally 20 to 40 
nanoseconds. However, the chip/space ratio per 1 bit in the construction of current SRAM 
devices is high, which is extremely unsuitable for high-density main storage devices. In addition, 
SRAM devices generally consume much more energy than DRAM devices. 

However, it has been proposed that SRAM cache memory be located in DRAM memory 
arrays. This method provides some solution for problems related to speed which occur when 



DRAM is accessed. This method has the following disadvantages: 1) It has been believed that a 
relatively large cache must be constructed in order to increase the probability of a hit. Due to the 
space that is needed from SRAM cells, the occupied space is above the allowable limit. 2) The 
logic and the register support required to realize the cache memory device takes up a very high 
amount of the physical space on the chip. The increase of the occupied space is probably not 
allowable in the DRAM chip, and if an off-chip arrangement is used, bus-compatible 
connections are required and by foregoing most parallel communications, the advantages of on- 
chip arrangement are lost. 

An article by Goodman and Chiang, "The Use of Static Column RAM as a Memory 
Hierarchy", The 1 Ithe Annual Symposium on Computer Architecture, IEEE Computer Society 
Press, (1094), page 167 ~ 174, proposed the use of the sense-amplifying row in a current static 
column decode DRAM device, or the use of cache memory with a static row buffer. Since static 
row buffers are already present in such devices, this solves the problem of the usage of space 
above an acceptable limit in low-density SRAM cache memory. However, although memory 
cells which are equal to the number of DRAM and array columns are contained according to this 
method, the problem is that only one row of cache memory is provided. Therefore, the 
probability of a "hit" is generally not very high. 

Goodman and Chiang also proposed as an improvement to use "by 2" or "by 4" memory 
devices instead of "by 1" memory device. In other words, to obtain for example the capacity of 1 
M bit, instead of using one DRAM array having 1,024 memory cells with one static raw buffer, 
having 1,024 x 1,024 memory cells in one device, a device having four 256K bit arrays is used, 
each having a static row buffer with 512 cells in lengths. This construction enables four 
individual accessible "cache" rows because four individual static rows buffers are used. 
However, this solution has a drawback. Because these "by 4" device are generally more costly 
than "by 1" devices, it is difficult to ensure error correction using standard error correction codes 
and procedures. And since such "by 4" devices require many more I/O pins than "by 1" devices, 
a large package is required. "By 4" devices also require many more on-chip addressable 
functions than "by 1" devices, and because four individual static buffers are contained, twice as 
much space is needed when compared to "by 1" devices. 

Means and Operation for Resolution of Problem Areas 

It is widely known that in order to achieve a high hit ratio with cache memory systems, a 
large cache is required, which means many memory cells. 
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However, the inventors have discovered through statistical model analysis and by running real 
software operations that with conventional microprocessors which run conventional software, the 
number of segments divided into memory arrays cached in the cache system is much more 



important for the hit ratio of the cache than the length of the segment. For example, a line of 
1 ,024 memory cells, cached from a single memory array row, does not have a much greater 
probability of a hit during a conventional processing run than a cache of 512 memory cells in the 
length, or even a cache line of 256 memory cells in length. This is apparently because 
conventional microprocessors and software require very frequently access in continuous memory 
positions, but these operations are interrupted when random hit access is required and when two 
and three mode address instructions are executed in memory operations. These interruptions 
naturally cause dumping of all the cache columns in a single line cache system. 

The result of this analysis is that the a cache consisting of two separately stored and 
accessed blocks or sections having the length of 512 memory cells, which has for example a 
length of 1,024 memory cells, has a higher chance of a hit than the block or part stored or 
accessed once. Moreover, for a section of four such blocks having a length of 256 memory cells, 
a much higher hit ratio is achieved than either with the one or two block construction described 
above, although the total number of memory cells remains constant. Although the more blocks 
there are with fewer cells per 1 block, the more efficient is the formation of the increase of the 
cache hit rate, when the number of about 16 blocks is reached, the logic and control required to 
address and access each block become a burden relative to the increase of the reached hit ratio. 
However, it can be expected that many more blocks can be realized with a further improvement 
of this a control. 

The present invention provides a conventional RAM array having a static row buffer 
which is functionally extended over the width of the device, wherein this static buffer is divided 
into two or more blocks or section. These blocks or sections provide RAM arrays for multiple 
cache lines which can be accessed without addressing the array itself. 

The present invention provides a data processing system using cache line CDRAM 
divided into multiple sections or blocks. 

According to this invention, unacceptable chip space is not required to create a workable 

cache. 

Furthermore, according to this invention, the cache is established on the chip so that 
parallel movement of data signal can be easily achieved. 

Further, according to this invention, multiple line cache is established without having to 
use "by 2", "by 4", or "by n" devices. 

Also, according to the present invention, a cache memory device is provided which 
maintains a high "hit probability". 

Further, because the present invention uses static RAM elements, a fast access to data 
signal is achieved. 



These and other advantages specific to this invention will be evident from the explanation 
and figures below. 

Embodiments 

Figure 1 is a block diagram of a conventional static column decode dynamic random 
access memory array 1 00. Dynamic random access memory cells, having n rows and m columns, 
form a memory cell array 20, connected with m parallel connection paths schematically 
represented with the reference symbol 5 to a static column decode buffer (SCD) 15. A column 
data multiplexer 25 is communicating with the SCD buffer 15. The column data multiplexer 25 
has address input lines AO - A 10, indicated for example with device bus 26, having in addition 
as input AO - A10 indicated with row address multiplexer 27. The static column decode dynamic 
access memory array, which is well know from prior art, is a data processing system which 
operates in a conventional manner to store and access data. 



[page 4] 

The system and the conventional operations of the well known SCD DRAM device form no part 
of the present invention, except as modified in this explanation. The system embodying the 
present invention is explained in the reference provided in Figure 2. This system includes a 
central processing unit (CPU) having an address bus 2 connected to a cache/DRAM controller 3 
and a row/column address multiplexer 5. The cache/DRAM controller 3 has a MIS signal output 
1 1 connected back to CPU 1, a row/column address output 4 connected to a row/column 
multiplexer 5, a row address-strobe output 8 and a column address strobe output 9, which are 
respectively connected to several DRAM devices 7. The row/column address multiplexer 5 has 
as output a multiplexed row/column address bus 6, and this is also connected to each of several 
DRAM devices 7. The DRAM device 7 outputs data to CPU 1 via a CPU data bus 10. 

The cache/DRAM controller of the block 3 will now be further explained with reference 
to Figure 3. The CPU address bus 2 is connected to converter 34 and to a block address 
demultiplexer 3 1 . The CPU address bus 2 is connected to a comparator 34 and to a block address 
demultiplexer 31 . The block address multiplexer 31 is operationally connected to separate 
registers of a TAG register file 32, which stores row address instructions for each block of the 
segmented static column buffer. The TAG register file 32 is communicating with the comparator 
34 through the TAG address bus 33. 

The comparator 34 outputs through an output line 1 1 MIS signal back to CPU 1 1, or to a 
DRAM controller 35. While the operation of the DRAM controller 35 will not be explained in 
detail as it is well known in this sphere of technology, it includes an output 8 for row address 
strobe (RAS), an output 9 for column address strobe (CAS), and a row/column address selector 
4. The RAS and CAS signal lines are connected to each row/column address selector 4 is input to 
row/address multiplexer 5. 
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The DRAM device of the present invention will now be explained with reference to 
Figure 4. The DRAM contains a charge array 71, which is provided with a configuration 
according to prior art. However, the array can also have a modified static random access within 
the scope of the present invention. The time multiplexed row/column signal is input to the device 
via a bus 6. A timing and control circuit 76 receives RAS signal 8 and CAS signal 9 and the 
other control signals required for the operation of the DRAM, which are not indicated here for 
the sake of simplicity, for example other signals such as READ/WRITE signal. A charge array 
71, generally comprising sense amplifiers cells, is in parallel communication with the charge 
array 71 of the present invention through the circuit line 75. It should be taken into consideration 
that the charge array 71 of the preseni invention is shown in the figure segmented into four 
blocks. Depending on the case, segmentation may not be realized, as in practice, the charge array 
which is functionally composed of n rows of me memory cells arranged in m columns is usually 
not physically divided into blocks. The division lines are shown to explain the operation of the 
device in this case. Similarly, the static column buffer 72 is shown divided into four individual 
blocks as shown. These lines are used to explain the operational division rather than to indicate 
the physical separation of the static column buffer 72. Each operational block of the static 
column buffer 72 is connected to a block address demultiplexer and control circuit 73, and to 
column address control and multiplexer 74. The column address control and multiplexer circuit 
74 is connected to CPU data bus 10. 

The operation of the system will now be explained with reference to Figure 2 through 
Figure 5. When, for example, a memory READ cycle is started from the CPU 1, the main 
cache/DRAM controller normally selects the column address from the CPU address bus 2 
through the line 4 connected to the row/column address multiplexer. The address is segmented 
into several fields, for example as shown in Figure 5. This address is multiplexed onto the 
DRAM address bus 6. 
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RAS and CAS are by default activated with lines 8 and 9, so that as is well known, the DRAM is 
in the static access mode. The cache/DRAM controller 2 decodes the block address field from 
the CPU address and selects the TAG address register fields 32, which is uniquely related to the 
decoded block address, or uniquely related to the block addressed in the static column buffer. 
The TAG register contains, naturally, the row address corresponding to the row address of the 
charge array 71, from which the block of data presently in that block of the static column buffer 
72 was sensed. The TAG address from the TAG register field 32 is output to the converter 34, 
which compares it here the CPU address input to the comparator 34 of bus 2. If the row address 
is equal to the tag address, this indicates a cache hit. If the addresses are not equal, this is a cache 
miss and the comparator outputs the miss signal to the miss line 11. 
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If a hit is indicated, the cache/DRAM controller will not be operated. This is because 
during these operations of the controller 3, the column address is used by the DRAM and the 
data bit is selected from static column buffer 72. The data is moved from the RAM to the CPU 
data bus 10 and this memory cycle of the PCU is completed. 

When a miss is detected, the miss signal is output by the controller 3 to the miss line 1 1 
and signal is output for the CPU 1 to wait for data. If the system timing creates a state in which 
data is already in the data bus, the data of the data bus 10 is will be ignored. The MIS signal is 
then sent also to the DRAM controller 35 and the DRAM controller 35 performs operations to 
resolve the miss according to the method described below. The RAS signal of the line 8 and the 
CAS signal of the line 9 are inactivated to cause precharging of the DRAM as is well known. 
The DRAM controller 35 causes the row/column multiplexer to send the row address field to the 
DRAM and the RAS signal is activated on line 8. Because of that, the DRAM obtains all the row 
data from the rows addressed in the array 71 and sends it to the static column buffer 72. Next, the 
DRAM controller 35 causes the row/column multiplexer 5 to multiplex the column address field 
to the DRAM and activates CAS on line 9. The DRAM decodes the block address with the block 
address multiplexer 73 so that the block corresponding to the static buffer 72 is loaded to the 
corresponding data block from the array 71. The other data blocks are not loaded to the static 
buffer. The DRAM controller then loads to the register corresponding to the TAG register field 
32 a new row address. The column address field is then used to output correct data bits to the 
CPU data bus and the cache/DRAM controller output signals in order to receive data by the 
CPU. 

The operations of the system using the invention were explained with respect to the 
READ cycle. A WRITE cycle can be executed with a conventional method which is known in 
this technological field, essentially without regard to the configuration of the multiple line buffer. 
However, the cache/DRAM controller 35 will be required to update the TAG register file 
according to the update of the block data stored in the static buffer 72. 

Various modifications of the present preferred embodiments which have been explained 
here can be also realized within the scope of the present invention. These modifications also 
include but are not limited by the examples described below. As described above, various 
elements of TAG register field 32, comparator 34, or DRAM controller 34 are related to or 
included in each memory device 7. Naturally, due to this inclusion, multiplication of these 
circuits is required, which may not be acceptable for multiple memory device systems. As 
explained above, the memory device used with the system of the present invention does not 
require a DRAM device. 
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The logic and control circuits may include the capability to make a determination as to whether 
the data of the static buffer 72 is replaced or retained. Buses 2 and 10 may have electrical, optical 
or other electromagnetic bias. The comparison of the TAG address to the CPU row address may 
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be performed in different ways, such as by indicating each row of the array with a block TAG 
code. Instead of direct mapping in a specified block of the array, cache blocks can be associated 
with any of the blocks of the array as determined by a logical operation to increase the hit rate of 
the cache. This is called a set-associative method. Other modifications within the scope of the 
patent claims are described below. 

(1) A memory device, including: individual memory cell arrays arranged operatively in rows 
and columns, and a buffer which receives and stores data signal from memory cell rows of said 
array of individual memory cells, wherein this buffer is segmented into more than one block. 

(2) The memory device described in claim 2, wherein the memory device is a static column 
decode dynamic random access memory. 

(3) The memory device described in claim 1, wherein the buffer includes static random 
access memory cell rows. 

(4) The memory device described in claim 1, further including a means assigning a function 
to each individual block of the buffer. 

(5) The memory device described in claim 1, wherein the buffer further includes a single 
operative line of n memory cells, wherein n corresponds to the number of the array columns, 
divided into S sections, and s is greater than 1. 

(6) The memory device described in claim 5, wherein m equals 4, and each section contains 
n/4 memory cells. 

(7) The memory device described in claim 5, wherein m equals 8, and each section contains 
n/8 memory cells. 

(8) A data processing system, including a central processing device, at least one memory 
device having arrays of memory cells arranged operationally in n rows and m columns; wherein 
at least one said memory device has a buffer containing at least m memory cells; wherein said 
buffer is operationally connected to said arrays of memory cells, said buffer is operationally 
divided into more than one section, containing a cache memory control means for the control of 
said memory device; wherein at least one address bus is connected to said central processing 
device, said cache memory control circuit and at least one said memory device; wherein the data 
bus is connected to said central processing device, and to at least one said memory device. 

(9) . The system described in claim 1, wherein said cache memory control means comprises a 
means assigning function, which assigns individual specified functions to more than one blocks 
so that data is stored in groups consisting of specified rows of said arrays. 

(10) The system described in claim 2, wherein said cache memory control means further 
includes a register field for storage of addresses corresponding to more than one of said blocks in 



8 



said buffer; wherein said addresses corresponding to the rows of said array are compared to the 
row addresses obtained from said address bus, including a comparator which holds output 
indicating the result of the comparison. 

(11) The data processing system described in claim 1, wherein said buffer is operationally 
divided into four separate blocks. 

(12) Random access memory device (72) using a stationary buffer (72) as a cache to speed up 
the time access of data elements obtained from the device. The static buffer (72) is operationally 
divided into two or more individual blocks, and each block has data consisting of differing rows 
of the array. By dividing the blocks into several functions of a single buffer, the probability of a 
cache "hit" is greatly increased and a faster access from the buffer is achieved. The control 
device (3) stores the row address (TAG) of each of the multiple blocks, the address is compared 
to the row address of the desired data and signal is created which contains the result of this 
comparison. 
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The random access memory array having a multiple line cache configuration uses a data 
processing system including: a CPU (1), address and data buses (2, 10, 1 1), control logic (3), and 
a multiplexer (5). 

4. Brief Explanation of Figures 

Figure 1 is a block diagram showing a customary SCD DRAM according to prior art 
technology. 

Figure 2 is a block diagram explaining the functions of the data processing system 
according to the present invention. 

Figure 3 is a more detailed block diagram of the cache/DRAM control device shown in 
Figure 2. 

Figure 4 is a more detailed diagram showing the functions of the multiple cache line 
DRAM in Figure 2 of the present invention. 

Figure 5 shows the CPU addresses in an address field. 

Explanation of Main Symbols 

1 : central processing unit, 
2: CPU address bus 
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3: 


cache/DRAM control device 


5: 


row/column multiplexer 


7: 


DRAM device 


10: 


CPU data bus 


32 


TAG register field 


33: 


TAG address bus 


34: 


comparator 


35: 


DRAM control device 


71: 


charge array 


72: 


static column buffer 



Representative: Akira ASAMURA, patent attorney. 
Figure 1 
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Figure 2 

1 CPU (central processing device) 
3 cache/DRAM control device 



Figure 3 

32 register 

35 DRAM control device 
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